{
  "cells": [
    {
      "cell_type": "markdown",
      "id": "6330a8a6",
      "metadata": {
        "id": "6330a8a6"
      },
      "source": [
        "Copyright (C) 2021-2022, Intel Corporation\n",
        "\n",
        "SPDX-License-Identifier: Apache-2.0"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "aa6f534b",
      "metadata": {
        "id": "aa6f534b"
      },
      "source": [
        "# Object detection with tiny YOLOv2 in Python using OpenVINO™ Execution Provider:\n",
        "\n",
        "1. The Object detection sample uses a tinyYOLOv2 Deep Learning ONNX Model from the ONNX Model Zoo.\n",
        "\n",
        "\n",
        "2. The sample involves presenting an Image to ONNX Runtime (RT), which uses the OpenVINO™ Execution Provider to run inference on various Intel hardware devices as mentioned before and perform object detection to detect up to 20 different objects like birds, buses, cars, people and much more.\n",
        "\n",
        "The source code for this sample is available [here](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/python/OpenVINO_EP/tiny_yolo_v2_object_detection)."
      ]
    },
    {
      "cell_type": "markdown",
      "id": "b2c9aa57",
      "metadata": {
        "id": "b2c9aa57"
      },
      "source": [
        "### Install Requirements"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "809559b8",
      "metadata": {
        "id": "809559b8"
      },
      "outputs": [],
      "source": [
        "!pip -q install --upgrade pip\n",
        "!pip -q install folium==0.2.1\n",
        "!pip -q install imgaug==0.2.6\n",
        "!pip -q install certifi==2022.5.18.1\n",
        "!pip -q install flatbuffers==2.0\n",
        "!pip -q install numpy==1.21.6\n",
        "!pip -q install onnx==1.11.0\n",
        "!pip -q install opencv-python==4.5.5.64\n",
        "!pip -q install scipy==1.7.3\n",
        "!pip -q install typing-extensions==4.1.1\n",
        "!pip -q install onnxruntime-openvino"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "24d73115",
      "metadata": {
        "id": "24d73115"
      },
      "source": [
        "### Import Necessary Resources"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "6f72e3ca",
      "metadata": {
        "id": "6f72e3ca"
      },
      "outputs": [],
      "source": [
        "import numpy as np\n",
        "import onnxruntime as rt\n",
        "import cv2\n",
        "import time\n",
        "import os\n",
        "from pathlib import Path\n",
        "import argparse\n",
        "import platform\n",
        "from google.colab.patches import cv2_imshow\n",
        "import sys"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "9399694a",
      "metadata": {
        "id": "9399694a"
      },
      "source": [
        "### Get the model and input"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "e2b38cc9",
      "metadata": {
        "id": "e2b38cc9"
      },
      "outputs": [],
      "source": [
        "#Create List of files in the directory\n",
        "files = os.listdir('.')\n",
        "\n",
        "#Get the neccesary files into the directory if they don't already exist\n",
        "if ('tinyyolov2-8.onnx' not in files):\n",
        "  !wget https://github.com/onnx/models/blob/main/vision/object_detection_segmentation/tiny-yolov2/model/tinyyolov2-8.onnx?raw=true -O tinyyolov2-8.onnx\n",
        "if ('dog.bmp' not in files):\n",
        "  !wget https://storage.openvinotoolkit.org/data/test_data/images/512x512/dog.bmp"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "ceff9a38",
      "metadata": {
        "id": "ceff9a38"
      },
      "source": [
        "## Preprocess"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "cb2dd5ca",
      "metadata": {
        "id": "cb2dd5ca"
      },
      "source": [
        "### Reshape the input to align with the model\n",
        "\n",
        "\n",
        "When we are using a pre-trained model, which is trained & fine-tuned using a fixed image size as input, we should resize our image to a shape which is expected by the model. The image is re-shaped to the desired image size i.e. $(416 \\times 416)$ using the `opencv` package. Here this is acheived by the `image_preprocess` helper function.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "34a2817f",
      "metadata": {
        "id": "34a2817f"
      },
      "outputs": [],
      "source": [
        "def image_preprocess(frame):\n",
        "  in_frame = cv2.resize(frame, (416, 416))\n",
        "  preprocessed_image = np.asarray(in_frame)\n",
        "  preprocessed_image = preprocessed_image.astype(np.float32)\n",
        "  preprocessed_image = preprocessed_image.transpose(2,0,1)\n",
        "  #Reshaping the input array to align with the input shape of the model\n",
        "  preprocessed_image = preprocessed_image.reshape(1,3,416,416)\n",
        "  return preprocessed_image"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "6dbc9aa8",
      "metadata": {
        "id": "6dbc9aa8"
      },
      "source": [
        "### Check file paths\n",
        "\n",
        "`check_model_extention` is a helper function which checks if the model is present in the location specified. It also validates the model by checking the model file extension.  \n",
        "The expected model file should be of `<model_name>.onnx` format."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "ca5d9356",
      "metadata": {
        "id": "ca5d9356"
      },
      "outputs": [],
      "source": [
        "def check_model_extension(fp):\n",
        "  # Split the extension from the path and normalise it to lowercase.\n",
        "  ext = Path(fp).suffix.lower()\n",
        "\n",
        "  # Now we can simply use != to check for inconsitencies with model file.\n",
        "  if(ext != \".onnx\"):\n",
        "    raise NameError(fp, \"is an unknown file format. Use the model ending with .onnx format\")\n",
        "  \n",
        "  if not Path(fp).exists():\n",
        "    raise OSError(\"[ ERROR ] Path of the onnx model file is Invalid\")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "bfc0e076",
      "metadata": {
        "id": "bfc0e076"
      },
      "source": [
        "## Postprocess"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "fca5634d",
      "metadata": {
        "id": "fca5634d"
      },
      "source": [
        "### Add the appropriate bounding boxes and the class label to the image based on the inference results"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "7a331243",
      "metadata": {
        "id": "7a331243"
      },
      "outputs": [],
      "source": [
        "def postprocess_output(out, frame, x_scale, y_scale):\n",
        "  out = out[0][0]\n",
        "  num_classes = 20\n",
        "  anchors = [1.08, 1.19, 3.42, 4.41, 6.63, 11.38, 9.42, 5.11, 16.62, 10.52]\n",
        "  \n",
        "  # color look up table for different classes for object detection sample\n",
        "  clut = [(0,0,0),(255,0,0),(255,0,255),(0,0,255),(0,255,0),(0,255,128),\n",
        "        (128,255,0),(128,128,0),(0,128,255),(128,0,128),\n",
        "        (255,0,128),(128,0,255),(255,128,128),(128,255,128),(255,255,0),\n",
        "        (255,128,128),(128,128,255),(255,128,128),(128,255,128),(128,255,128)]\n",
        "\n",
        "  # 20 labels that the tiny-yolov2 model can do the object_detection on\n",
        "  label = [\"aeroplane\",\"bicycle\",\"bird\",\"boat\",\"bottle\",\n",
        "            \"bus\",\"car\",\"cat\",\"chair\",\"cow\",\"diningtable\",\n",
        "            \"dog\",\"horse\",\"motorbike\",\"person\",\"pottedplant\",\n",
        "            \"sheep\",\"sofa\",\"train\",\"tvmonitor\"]\n",
        "\n",
        "  existing_labels = {l: [] for l in label}\n",
        "\n",
        "  #Inside this loop we compute the bounding box b for grid cell (cy, cx)\n",
        "  for cy in range(0,13):\n",
        "    for cx in range(0,13):\n",
        "      for b in range(0,5):\n",
        "      # First we read the tx, ty, width(tw), and height(th) for the bounding box from the out array, as well as the confidence score\n",
        "        channel = b*(num_classes+5)\n",
        "        tx = out[channel  ][cy][cx]\n",
        "        ty = out[channel+1][cy][cx]\n",
        "        tw = out[channel+2][cy][cx]\n",
        "        th = out[channel+3][cy][cx]\n",
        "        tc = out[channel+4][cy][cx]\n",
        "\n",
        "        x = (float(cx) + sigmoid(tx))*32\n",
        "        y = (float(cy) + sigmoid(ty))*32\n",
        "        w = np.exp(tw) * 32 * anchors[2*b]\n",
        "        h = np.exp(th) * 32 * anchors[2*b+1]\n",
        "\n",
        "        #calculating the confidence score\n",
        "        confidence = sigmoid(tc) # The confidence value for the bounding box is given by tc\n",
        "        classes = np.zeros(num_classes)\n",
        "        for c in range(0,num_classes):\n",
        "          classes[c] = out[channel + 5 +c][cy][cx]\n",
        "          # we take the softmax to turn the array into a probability distribution. And then we pick the class with the largest score as the winner.\n",
        "          classes = softmax(classes)\n",
        "          detected_class = classes.argmax()\n",
        "          # Now we can compute the final score for this bounding box and we only want to keep the ones whose combined score is over a certain threshold\n",
        "          if 0.75 < classes[detected_class]*confidence:\n",
        "            color =clut[detected_class]\n",
        "            x = (x - w/2)*x_scale\n",
        "            y = (y - h/2)*y_scale\n",
        "            w *= x_scale\n",
        "            h *= y_scale\n",
        "               \n",
        "            labelX = int((x+x+w)/2)\n",
        "            labelY = int((y+y+h)/2)\n",
        "            addLabel = True\n",
        "            lab_threshold = 100\n",
        "            for point in existing_labels[label[detected_class]]:\n",
        "              if labelX < point[0] + lab_threshold and labelX > point[0] - lab_threshold and \\\n",
        "                 labelY < point[1] + lab_threshold and labelY > point[1] - lab_threshold:\n",
        "                 addLabel = False\n",
        "            #Adding class labels to the output of the frame and also drawing a rectangular bounding box around the object detected.\n",
        "            if addLabel:\n",
        "              bbox_mess = f'{label[detected_class]}:{classes[detected_class]*confidence:.3f}'\n",
        "              cv2.rectangle(frame, (int(x),int(y)),(int(x+w),int(y+h)),color,2)\n",
        "              cv2.rectangle(frame, (int(x),int(y-13)),(int(x)+9*len(bbox_mess),int(y)),color,-1)\n",
        "              cv2.putText(frame,bbox_mess,(int(x)+2,int(y)-3),cv2.FONT_HERSHEY_COMPLEX,0.4,(255,255,255),1)\n",
        "              existing_labels[label[detected_class]].append((labelX,labelY))\n",
        "              print(f'{label[detected_class]} detected in frame with {classes[detected_class]*confidence*100}% probability ')"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "86200685",
      "metadata": {
        "id": "86200685"
      },
      "source": [
        "### Show the image with the bounding boxes"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "4ff22f1f",
      "metadata": {
        "id": "4ff22f1f"
      },
      "outputs": [],
      "source": [
        "def show_bbox(device, frame, inference_time):\n",
        "  cv2.putText(frame,device,(10,20),cv2.FONT_HERSHEY_COMPLEX,0.5,(255,255,255),1)\n",
        "  frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)\n",
        "  frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)\n",
        "  cv2_imshow(frame)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "2a609551",
      "metadata": {
        "id": "2a609551"
      },
      "source": [
        "### sigmoid and softmax functions to make sense of infrence output for the model"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "7b22f9ad",
      "metadata": {
        "id": "7b22f9ad"
      },
      "outputs": [],
      "source": [
        "def sigmoid(x, derivative=False):\n",
        "  return x*(1-x) if derivative else 1/(1+np.exp(-x))\n",
        "\n",
        "def softmax(x):\n",
        "  score_mat_exp = np.exp(np.asarray(x))\n",
        "  return score_mat_exp / score_mat_exp.sum(0)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "xEQDTN4xKgfs",
      "metadata": {
        "id": "xEQDTN4xKgfs"
      },
      "source": [
        "## Inference"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "7617f559",
      "metadata": {
        "id": "7617f559"
      },
      "source": [
        "### Create a session for inference based on the device selected\n",
        "\n",
        "Inferencing using OpenVINO Execution provider under ONNX-Runtime, is performed using the below simple steps:\n",
        "    \n",
        "1. Create a ONNX Runtime Session Option instance using `onnxruntime.SessionOptions()`\n",
        "2. Using the session options instance create a Inference Session object by passing the model and the execution provider as arguments.\n",
        "Execution Providers are the hardware device options e.g. CPU, GPU and NPU on which the session will be executed.\n",
        "\n",
        "The below `create_sess` function actually takes care of the above steps. All we need to do is pass the device arguement to it. It'll return the appropriate session according to the selected device along with the input name for the model.\n",
        "\n",
        "The device option should be chosen from any one of the below options:  \n",
        "- `cpu, CPU, GPU, NPU`"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "438b7b76",
      "metadata": {
        "id": "438b7b76"
      },
      "outputs": [],
      "source": [
        "def create_sess(device):\n",
        "  so = rt.SessionOptions()\n",
        "  so.log_severity_level = 3\n",
        "  if (device == 'cpu'):\n",
        "    print(\"Device type selected is 'cpu' which is the default CPU Execution Provider (MLAS)\")\n",
        "    #Specify the path to the ONNX model on your machine and register the CPU EP\n",
        "    sess = rt.InferenceSession(model, so, providers=['CPUExecutionProvider'])\n",
        "  elif (device == 'CPU' or device == 'GPU' or device == 'NPU'):\n",
        "    #Specify the path to the ONNX model on your machine and register the OpenVINO EP\n",
        "    sess = rt.InferenceSession(model, so, providers=['OpenVINOExecutionProvider'], provider_options=[{'device_type' : device}])\n",
        "    print(\"Device type selected is: \" + device + \" using the OpenVINO Execution Provider\")\n",
        "    '''\n",
        "    other 'device_type' options are: (Any hardware target can be assigned if you have the access to it)\n",
        "    'CPU', 'GPU', 'NPU'\n",
        "    '''\n",
        "  else:\n",
        "    raise Exception(\"Device type selected is not [cpu, CPU, GPU, NPU]\")\n",
        "\n",
        "  # Get the input name of the model\n",
        "  input_name = sess.get_inputs()[0].name\n",
        "\n",
        "  return sess, input_name"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "d1a39044",
      "metadata": {
        "id": "d1a39044"
      },
      "source": [
        "### Specify the device and path to model, Image"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "39fec459",
      "metadata": {
        "id": "39fec459"
      },
      "outputs": [],
      "source": [
        "model = \"tinyyolov2-8.onnx\"\n",
        "image = \"dog.bmp\""
      ]
    },
    {
      "cell_type": "markdown",
      "id": "3df2a717",
      "metadata": {
        "id": "3df2a717"
      },
      "source": [
        "### Validate model file path"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "4d29f8f8",
      "metadata": {
        "id": "4d29f8f8"
      },
      "outputs": [],
      "source": [
        "check_model_extension(model)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "j5QfFB1PJ_uq",
      "metadata": {
        "id": "j5QfFB1PJ_uq"
      },
      "source": [
        "### Setup input and output"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "DX-C07_3zcck",
      "metadata": {
        "id": "DX-C07_3zcck"
      },
      "outputs": [],
      "source": [
        "if (image):\n",
        "    \n",
        "    #Check if image file exists\n",
        "    if not Path(image).is_file():\n",
        "        raise OSError(\"Input image file \", image, \" doesn't exist as a file\")\n",
        "\n",
        "    # Open the image file\n",
        "    cap = cv2.imread(image)\n",
        "\n",
        "    height, width, c = cap.shape\n",
        "    x_scale = float(width)/416.0  #In the document of tino-yolo-v2, input shape of this network is (1,3,416,416).\n",
        "    y_scale = float(height)/416.0\n",
        "\n",
        "    output_file = Path(image).stem+'_tiny_yolov2_out_py.jpg'"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "2SgMAeXVfQrR",
      "metadata": {
        "id": "2SgMAeXVfQrR"
      },
      "source": [
        "### Run the inference with default CPU Execution Provider <a name=\"cpu_exec\"></a>\n",
        "\n",
        "Now the `tinyyolov2-8.onnx` model will run inference on `cat.jpg` image using the below two execution providers:\n",
        "- `cpu`: default CPU Execution Provider (MLAS)  \n",
        "- `CPU`: Execution on CPU with OpenVino Execution Provider\n",
        "\n",
        "The below code block performs the following operations:\n",
        "\n",
        "1. Creates a Onnx Runtime Session using `cpu` as device\n",
        "2. Loads the input image and performs pre-processing using `image_preprocess` function\n",
        "3. Performs prediction on the same image for $100$-times\n",
        "4. Calculates average inference time\n",
        "5. Performs post-processing, non-max suppression & bounding-box drawing using the predicted outputs from the model\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "1sssvylKfR_0",
      "metadata": {
        "id": "1sssvylKfR_0"
      },
      "outputs": [],
      "source": [
        "### create a session with cpu which is the default CPU Execution Provider\n",
        "sess, input_name = create_sess(\"cpu\")\n",
        "\n",
        "#capturing one frame at a time from the video feed and performing the inference\n",
        "frame = cap.copy()\n",
        "\n",
        "#preprocessing the input frame and reshaping it.\n",
        "#In the document of tiny-yolo-v2, input shape of this network is (1,3,416,416). so we resize the model frame w.r.t that size.\n",
        "preprocessed_image =  image_preprocess(frame)\n",
        "\n",
        "print(\"PREDICTION - BEGIN\")\n",
        "#warmup\n",
        "sess.run(None, {input_name: preprocessed_image})\n",
        "\n",
        "start = time.perf_counter()\n",
        "#Running the session by passing in the input data of the model\n",
        "for i in range(100):\n",
        "  out = sess.run(None, {input_name: preprocessed_image})\n",
        "\n",
        "end = time.perf_counter()\n",
        "inference_time = end - start\n",
        "print(f'Avg Inference time in ms: {(inference_time/100 * 1000)}')\n",
        "print(\"PREDICTION - END\")\n",
        "\n",
        "#Get the output\n",
        "postprocess_output(out, frame, x_scale, y_scale)\n",
        "\n",
        "#Show the output\n",
        "show_bbox(\"cpu\", frame, inference_time)\n",
        "\n",
        "#Write the frame with the detection boxes\n",
        "cv2.imwrite(output_file, frame.astype(np.uint8))\n",
        "print('\\n')"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "XUkRaU9uI0zX",
      "metadata": {
        "id": "XUkRaU9uI0zX"
      },
      "source": [
        "### Run the inference with OpenVINO Execution Provider\n",
        "The below code block performs the same opertions as [before](#cpu_exec) with `CPU` as device, that runs on OpenVINO Execution Provider for ONNX Runtime."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "8b5517d1",
      "metadata": {
        "id": "8b5517d1"
      },
      "outputs": [],
      "source": [
        "### create a session with CPU_FP32 using the OpenVINO Execution Provider\n",
        "sess, input_name = create_sess(\"CPU\")\n",
        "\n",
        "#capturing one frame at a time from the video feed and performing the inference\n",
        "frame = cap.copy()\n",
        "\n",
        "#preprocessing the input frame and reshaping it.\n",
        "#In the document of tiny-yolo-v2, input shape of this network is (1,3,416,416). so we resize the model frame w.r.t that size.\n",
        "preprocessed_image =  image_preprocess(frame)\n",
        "\n",
        "print(\"PREDICTION - BEGIN\")\n",
        "#warmup\n",
        "sess.run(None, {input_name: preprocessed_image})\n",
        "\n",
        "start = time.perf_counter()\n",
        "#Running the session by passing in the input data of the model\n",
        "for i in range(100):\n",
        "  out = sess.run(None, {input_name: preprocessed_image})\n",
        "\n",
        "end = time.perf_counter()\n",
        "inference_time = end - start\n",
        "print(f'Avg Inference time in ms: {(inference_time/100 * 1000)}')\n",
        "print(\"PREDICTION - END\")\n",
        "\n",
        "#Get the output\n",
        "postprocess_output(out, frame, x_scale, y_scale)\n",
        "\n",
        "#Show the output\n",
        "show_bbox(\"CPU\", frame, inference_time)\n",
        "\n",
        "#Write the frame with the detection boxes\n",
        "cv2.imwrite(output_file, frame.astype(np.uint8))\n",
        "print('\\n')"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [],
      "name": "OVEP_tiny_yolov2_obj_detection_sample.ipynb",
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3.8.10 ('tiny_yolovV2': venv)",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.10"
    },
    "vscode": {
      "interpreter": {
        "hash": "0226610a830f21b877cea989523fd0c295c7466b3cab1c0f323be92db3371acd"
      }
    }
  },
  "nbformat": 4,
  "nbformat_minor": 5
}
